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Abstract 

Extending previous analyses on function classes like linear functions, we analyze how the 
simple (1+1) evolutionary algorithm optimizes pseudo-Boolean functions that are strictly 
monotone. Contrary to what one would expect, not all of these functions are easy to 
optimize. The choice of the constant c in the mutation probability p(n) — c/n can make a 
decisive difference. 

We show that if c < 1, then the (1+1) EA finds the optimum of every such function in 
O(ralogn) iterations. For c = 1, we can still prove an upper bound of 0(n 3/ ' 2 ). However, for 
c > 33, we present a strictly monotone function such that the (1+1) EA with overwhelming 
probability does not find the optimum within 2 ri ' n - ) iterations. This is the first time that we 
observe that a constant factor change of the mutation probability changes the run-time by 
more than constant factors. 

1 Introduction 

Rigorously understanding how randomized search heuristics solve optimization problems and 
proving guarantees for their performance remains a challenging task. The current state of the 
art is that we can analyze some heuristics for simple problems. Nevertheless, this gave new 
insight, helped to get rid of wrong beliefs, and turned correct beliefs into proven facts. 

For example, it was long believed that a pseudo-Boolean function / : {0, 1}" — > M. is easy to 
optimize if it is unimodal, that is, if each x € {0, l} n that is not optimal has a Hamming neighbor 
y with f(y) > f(x) IMuh92] . Recall that y is called a Hamming neighbor of x if x and y differ 
in exactly one bit. 

This belief was debunked in [D JW98 . There the unimodal long fc-path function |HGD94) was 
considered and it was proven that the simple (1+1) evolutionary algorithm ((1+1) EA) with high 
probability does not find the optimum within 2 v/ ™ iterations. This classical episode shows how 
important it is to support an intuitive understanding of evolutionary algorithms with rigorous 
proofs. 

It also shows that it is very difficult to identify problem classes that are easy for a particular 
randomized search heuristic. This, however, is needed for a successful application of such meth- 
ods, because the no free lunch theorems [IT04j tell us, in simple words, that no randomized search 
heuristic can be superior to another if we do not restrict the problem class we are interested in. 
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1.1 Previous Work 



In the following, we restrict ourselves to classes of simple pseudo-Boolean functions. We stress 
that the last ten years also produced a number of results on combinatorial problems [OHY] . At 
the same time research on classical test functions and function classes continued, spurred by the 
many still open problems. 

We also restrict ourselves to one of the most simple randomized search heuristics, the 
(1+1) EA. The first rigorous results on this heuristic were given by Miihlenbein [Miih92 , who 
determined how long it takes to find the optimum of simple test functions like OneMax(i) := 
yii—i Xj, counting the number of 1-bits. Quite some time later, and with much more techni- 
cal effort necessary, Droste, Jansen and Wegener [DJW02] extended the O(nlogn) bound to 
all linear functions f(x) :~ Y^i=i aiXi - Since it was hard to believe that such a simple result 
should have such a complicated proof, this work initiated a sequence of follow-up results in 
particular introducing drift analysis to the community [HY01, HY02] or refining it for our pur- 
poses |Jag08[ IDFW101 IDJWIOj . However, not all promising looking function classes are easy 
to optimize. As laid out in the first paragraphs of this paper, already unimodal functions are 
difficult. 

Almost all results described above were proven for the standard mutation probability 1/n. It 
is easy to see from their proofs (or, in the case of linear functions, cf. DG10 ), that all results 
remain true for p(n) = c/n, where c can be an arbitrary constant. 

We should add that the question how to determine the right mutation probability is also 
far from being settled. Most theory results for simplicity take the value p(n) = but it is 
known that this is not always optimal JWQO]. In practical applications, similarly 1/n is the 
most recommended static choice for the mutation probability [BFM97, Och02 in spite of known 
limitations of this choice [CS09 . 

1.2 Our Work 

In this work, we regard the class of strictly monotone functions. A pseudo-Boolean function 
is called strictly monotone (or simply monotone in the following) if any mutation flipping at 
least one 0-bit into a 1-bit and no 1-bit into a 0-bit strictly increases the function value. Hence 
much stronger than for unimodal functions, we not only require that each non-optimal x has 
a Hamming neighbor with better /-value, but we even ask that this holds for all Hamming 
neighbors that have an additional 1-bit. 

Obviously, the class of monotone functions includes linear functions with all bit weights 
positive. On the other hand, each monotone function is unimodal. Contrary to the long k- 
path function there is always a short path of at most n search points with increasing /-value 
connecting a search point to the optimum. 

It is easy to see that monotone functions are just the ones where a simple coupon collector 
argument shows that random local search finds the optimum in time 0(n\ogn). Surprisingly, we 
find that monotone functions are not easy to optimize for the (1+1) EA in general. Secondly, our 
results show that for this class of functions, the mutation probability p(n) = c/n, c a constant, 
can make a crucial difference. 

More precisely, we show that for c < 1 the (1+1) EA with mutation probability c/n finds 
the optimum of any monotone function in time O(nlogn), which is best possible given previous 
results on linear functions. For c = 1, the drift argument breaks down and we have resort to an 
upper bound of 0(n 3 / 2 ) based on a related model by Jansen |Jan07j . We currently do not know 
what is the full truth. As lower bound, we only have the general lower bound fi(nlogn) for all 
mutation-based evolutionary algorithms. 
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If c is sufficiently large, an unexpected change of regime happens. For c > 33, we show 
that there are monotone functions such that with overwhelming probability, the (1+1) EA does 
not find the optimum in exponential time. The construction of such functions heavily uses 
probabilistic methods. To the best of our knowledge, this is the first time that problem instances 
are constructed this way in the theory of evolutionary computation. 

2 Preliminaries 

We consider the maximization of a pseudo-Boolean function / : {0, 1}™ — ¥ K by means of a simple 
evolutionary algorithm, the (1+1) EA. The results can easily be adapted for minimization. In 
this work, n always denotes the number of bits in the representation. 

The (1+1) EA (Algorithm [1]) maintains a population of size 1. In each generation it creates 
a single offspring by independently flipping each bit in the current search point with a fixed 
mutation probability p(n). The new search point replaces the old one in case its /-value is not 
worse. 



Algorithm 1 (1+1) EA with mutation probability p(n) 
1: Initialization: Choose x £ {0, 1}" uniformly at random. 
2: repeat forever 

3: Create y £ {0, 1}™ by copying x. 

4: Mutation: Flip each bit in y independently with probability p{n). 
5: Selection: if f(y) > f(x) then x := y. 



In our analyses we denote by mut(cc) the bit string that results from a mutation of x. We 
denote as x + the search point that results from a mutation of x and a subsequent selection. 
Formally, y = mut(x) and x + = y if f(y) > f(x) and x + = x otherwise. 

For x = x\ ■ ■ ■ x n let Z(x) describe the positions of all 0-bits in x, Z(x) :— {1 < i < n \ x% = 
0}. By |a;| = |Z(x)| we denote the number of 0-bits in x and by \x\ 1 = n — \x\ we denote the 
number of 1-bits. For k £ N let [k] := {1,2,..., k}. For a set I = {ii, «2, ■ • ■ , ii} Q [n] we write 
Xij = Xi t Xi 2 • • ■ Xi e for the sub-string of x with the bits selected by /. To simplify notation we 
assume that any time we consider some r £ R[j~ but in fact need some r' £ No we assume that r 
is silently replaced by \r\ or |r] as appropriate. 

We are interested in the optimization time, defined as the number of mutations until a 
global optimum is found. For the (1+1) EA this is an accurate measure of the actual run time. 
For bounds on the optimization time we use common asymptotic notation. A run time bound is 
called exponential if it is 2^ n \ We also say that an event A occurs with overwhelming probability 
(w. o. p.) if 1 - Pr(A) = 2~ n{ - n \ 

A function / is called linear if it can be written as f(x) := aiXi for weights di, . . . , a n £ 

R. The most simple linear function is the function OneMax(x) := 53lLi x i — Mr Another 
intensively studied linear function is BinVal(x) := £" =1 2 n - l x l . As 2 n ~ i > J2]= i+ i 2 "~^ the 
bit value of some bit i dominates the effect of all bits i + 1, . . . , n on the function value. Both 
will later be needed in our construction. 

For two search points x,y £ {0, 1}", we write x < y if Xi < y, holds for all 1 < i < n. We 
write x < y if x < y and x ^ y hold. We call / a strictly monotone function (usually called 
simply monotone in the following) if for all x,y £ {0, 1}™ with x < y it holds that f(x) < f{y). 
Observe that the above condition is equivalent to f(x) < f(y) for all x and y such that x and y 
only differ in exactly one bit and this bit has value 1 in y. In other words, every mutation that 
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only flips bits with value strictly increases the function value. Clearly, the all-ones bit string 
l n is the unique global optimum for a monotone function. 

For the (1+1) EA, the difficulty of monotone functions strongly depends on the mutation 
probability p(n). We are interested in mutation probabilities p(n) = c/n for some constant 
c G K + . For constants c < 1 on average in a single mutation less than one bit flips. If this is 
a 1-bit we have f(x) > /(mut(a;)) and x = x + holds. Otherwise, f{x + ) > f(x) holds and we 
accept this move. This way the number of 0-bits is quickly reduced to and the unique global 
optimum is found. Using drift analysis this reasoning can easily be made precise. We state the 
result here and omit the proof due to space restrictions. 

Theorem 2. The expected optimization time of the (1+1) EA with mutation probability p(n) = 
c/n, < c < 1 constant, is O(nlogn) for every monotone function. 

The proof of Theorem [2] breaks down for c = 1. In this case the drift in the number of 
1-bits can be bounded pessimistically by a model due to Jansen [Jan07] where we consider a 
random process that mutates x to y with mutation probability p(n) = 1/n and replaces x by y 
if either x < y holds or we have neither x < y nor y < x but \y\ 1 < \x\ 1 holds. This worst case 
model yields an upper bound of 0(n 3 / 2 ) for the expected optimization time of the (1+1) EA 
with mutation probability pin) = 1/n on monotone functions. 

Theorem 3. The expected optimization time of the (1+1) EA with mutation probability p(n) = 
1/n is 0(n 3 / 2 ) for every monotone function. 

Our main result is that using mutation probability p(n) — c/n where c is a sufficiently large 
constant, optimization of monotone functions can become very difficult for the (1+1) EA. This 
is the first result where increasing the mutation probability by a constant factor increases the 
optimization time from polynomial to exponential with overwhelming probability. 

Theorem 4. There exists a monotone function f : {0,1}" — > N such that the (1+1) EA with 
mutation probability pin) = c/n, c > 33 constant, does not optimize f within 2™ n > mutations 
with overwhelming probability. 

The formal proof of this result is somewhat technical and lengthy. Therefore, in this extended 
abstract, we present how to construct such a monotone function / in the following section. In 
Section 31 we describe why this function is difficult to optimize. Complete proofs are available 
from the authors by request. 

3 A Difficult to Optimize Monotone Function 

In this section, we describe a monotone function that is difficult to optimize via a (1+1) EA with 
mutation probability p(n) — c/n, if c > 33 is constant. 

The main idea is the construction of a kind of long path function (compare the work by Horn, 
Goldberg, and Deb (HGD94j ). We also have an exponentially long path such that shortcuts 
can only be taken if a large number of bits flip simultaneously, a very unlikely event. The 
construction is complicated by the fact that the function needs to be monotone. Hence we 
cannot forbid leaving the path by giving the boundary of the path an unfavorable fitness. We 
solve this problem, roughly speaking, by implementing the path on a level of bit strings having 
similar numbers of 1-bits. Monotonicity simply forbids leaving the level to strings having fewer 
1-bits. The crucial part of our construction is setting up the function in such a way that, in spite 
of monotonicity, not too many 1-bits are collected. 
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For x £ {0, 1}™ let B C [n] be a subset of all indices [n]. The bits Xi with i £ B are referred 
to as window. The bits Xi with i (/ B are outside the window. Inside the window the function 
value is given by BinVal. The weights for BinVal are ordered differently for each window in 
order to avoid correlation between windows. The window is placed such that there is only a 
small number of 0-bits outside the window. Reducing the number of 0-bits outside causes the 
window to be moved. This is a likely event that happens frequently. However, we manage to 
construct an exponentially long sequence of windows with the additional property that in order 
to come from one window to one with large distance in this sequence a large number of bits needs 
to be flipped simultaneously. Since this is highly unlikely, it is very likely that the sequence of 
windows is followed. This takes an exponential number of steps with overwhelming probability. 
Droste, Jansen, and Wegener |DJW98j embed the long path into a unimodal function in a way 
that the (1+1) EA reaches the beginning of the path with probability close to 1. We adopt this 
technique and extend it to our monotone function. 

The following Lemma [5] defines the sequence of windows of our function by defining the index 
sets Bi. The property that windows with large distance have large Hamming distance is formally 
stated as \i — j\ > £ => \Bi C\ Bj\ < j£ for I = Q(n) and some constant 7 > 0. 

Lemma 5. Let j$, 7 £ M be constants with < (3 and p := (3/(1 — 2(3) < 7 < 1. Let n £ N 
and L := |_exp((7 — p) 2 (l — 2/3)n/6)J. Let £ := (3n and L' := L — I + 1. Then there are 
bi, &2, . . . , &l £ [n] such that the following holds. Let Bi := {bi, . . . , for all i £ [£/]. 

Then 

ft) \Bi\ = £ for alii £ [V], 

(ii) \Bi C\Bj\ < j£ for all i,j £ [L'\ such that \i - j\ > £. 

Proof. The proof invokes the probabilistic method AS00 , that is, we describe a way to ran- 
domly choose the bi that ensures that properties (i) and (ii) hold with positive probability. This 
necessarily implies the existence of such a sequence. 

Let the bx, 62, ■ • ■ , i>L be chosen uniformly at random subject to condition (i). More precisely, 
let 61 £ [n] be chosen uniformly at random. If b\,... , bi-i are already chosen, then choose bi 
from [n] \ {& m ax{i,i-£} > • ■ • 1 uniformly at random. 

Let i,j £ [L'\ with i < j and \i — j\ > t. By definition, the sets Bi and Bj do not share 
an index. Fix any outcome of Bi. For all k £ {0, ...,£— 1} we have that, conditional on any 
outcomes of all other 6s, the probability that bj+k £ Bi is at most \Bi\/(n — 2£) — (3/(1 — 2(3). 
Consequently, the random variable C = \BiC\Bj \ is dominated by a random variable X that is the 
sum of £ independent indicator random variables that are one with probability p = (3/(1 — 2(3). 
Hence a simple Chernoff bound (cf. e.g. [MU05J) yields 

Pr(C > j£) < Pt(X > j£) = Pr(X > (1 + • pi) < exp(-(^) 2 p£/3). 

Since there are less than L 2 choices of i,j £ [L'], a simple union bound yields 

Pr(3i,j£[L'}: (\i - j\ > £) A (\B t n B,\ > j£)) < L 2 C xp(-(^) 2 p£/3) < 1. □ 

The following definition introduces the difficult monotone function we consider. Note that it 
assumes the sequence of windows Bi to be given. For x £ {0, 1}" some i £ \L'\ is a potential 
position in the sequence of windows if the number of 0-bits outside the window Bi is limited by 
an, a > some constant. We select the largest potential position i as actual position and have 
the function value for x depend mostly on this position. If no potential position i exists, we have 
not yet found the path of windows and lead the (1+1) EA towards it. If i = L' , i. e., the end of 
the path is reached, the (1+1) EA is lead towards the unique global optimum via OneMax. 
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Definition 6. Let j3, 7, £, L, V , the bi and Bi be as in Lemma\^ Let a € K with < a < j3. 
For x G {0,1}" let B x := {i G [I/] | |{j € [n] | Xj = 0} \ B*| < an}. Let i* := max8 I; i/ 
yB K is non-empty. For i G [£'] Zet 7r^ &e a permutation of Bi. Denote by II = (7P 1 ), . . . , 7r( L >) 
the sequence of these permutations. We use the short-hand tt^'(x) to denote the vector obtained 
from permuting the components of (x;, 4 , . . . ,X6 4+< _ 1 ) according to . Consequently, tt^(x) — 

We define f n : {0, 1}™ -> N via 

r|x| W \ Bl | 1 -2» + BiNVAL( 7 r ( I^H\ Bl | 1 )( a:)); ifB x =$, 
/ n (x) :=<«*• 2 2 " + BlNVAL(7r("+^)(a;)), ifB x ^$ and n + i* < L' ; 

• 2 3n + |str| x , otherwise. 

We state one observation concerning the function /n that is important in the following. It 
states that as long as the end of the path of windows is not found the number of 0-bits outside 
is not only bounded by an but equals an. This property will be used later on to show that the 
window is moved frequently. 

Lemma 7. Let fn : {0, 1}™ — > No be as in Definition Let x G {0, 1}™ with B x ^ and 
i* x = maxBj. Ifn + i* < L' , then \Z(x) \ Bi* | = an. 

Proof. By assumption we have n + i* < L' . We consider B n+ i* + i and see that the set coincides 
with B n+ i* in all but two elements: we have B n+ i* \ B n+ i* + i — {b n+ i*} and \ Bi* = 

{b n +i*+i}. Consequently, \Z(x) \ B n +i*\ and \Z(x) \ B n+ i* + i\ differ by at most one. Thus, 
\Z(x) \ B n+ i* I < an implies \Z(x) \ B n+ i* + i\ < an and we can replace i* by i* + 1. This 
contradicts i* — maxB x . We have \Z(x)\B n+ i* \ < an by definition and thus \Z(x)\B n+ i* \ = an 
follows. □ 

Our first main claim is that fn is in fact monotone. This is not difficult, but might, due to 
the complicated definition of /n, not be obvious. 

Lemma 8. For all II as above, fn is monotone. 

Proof. Let / := fn- Let x G {0, 1}" and j G [n] such that Xj = 0. Let y G {0, 1}™ be such that 
Uk — %k for all fc G [n] \ {j} and yj = 1 — Xj. That is, y is obtained from x by flipping the jth 
bit (which is zero in x) to one. To prove the lemma, it suffices to show f(x) < f(y). 

Let first B x = 0. If B y ^ we have /(x) < n • 2" + 2" and /(y) > 2 2n so /(x) < /(y) 
follows. If By = we have either |x|[„]\b 1 L < |y|[ n ]\s 1 | x (in case j £ B\) or BinVal^ 1 ) (x)) < 
BlNVAL(7r ( ^(y)) (in case j G In both cases, /(x) < /(y) holds. 

Now assume B x ^ and n + i* < L' . By definition S x C B y , hence i* > i* x . If i* = 
i* x , then j G B;* and /(y) > /(x) follows from BiNVAL( 7 r ( ™ +l « ) (y)) = BiNVAL^+^y)) > 
BiNVAL(7r(" +4 -)(a;)). If i* y > i* x , then f(y) > f(x). In all other cases, /(x) = L2 3n + \x\ 1 and 
/(y) = L2 3 " + |y| lf hence f(y)>f(x). □ 

4 Proof of the Lower Bound 

Theorem 9. Consider the (1+1) FA with mutation probability c/n for c > 33 on the func- 
tion f := fn from Definition^ where II is chosen uniformly at random and the parameters are 
chosen according to /3 := 10/131, 7 := 20/221, and a := l/(1000c). There is a constant n > 
such that with probability 1 — 2~ n( ™) the (1+1) FA needs at least 2 Kn generations to optimize f . 
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This result above shows that if / is chosen randomly (according to the construction described), 
then the (1+1) EA w.o.p. needs an exponential time to find the optimum. Clearly, this implies 
that there exists a particular function /, that is, a choice of II, such that the EA faces these 
difficulties. This is Theorem [4] 

The proof for Theorem is long and technical. Therefore, we only discuss the main proof 
ideas here. A complete proof can be found in the appendix. 

Both after a typical initialization, when B x = 0, and afterwards, when B x ^ and n+i x < L', 
we have the following situation. There is a window of bits if i* x is defined and B\ otherwise) 
such that the fitness increases with BinVal as a function on the bits inside the window. Moreover, 
the fitness is always increased in case the mutation decreases the number of 0-bits outside the 
window. If B x = this is due to the term |x|[„]\ Bl | • 2™ in the fitness function and otherwise 
it is because the current i*-value has increased. The gain in fitness is so large that it dominates 
any change of the bits in the window. 

We claim that with this construction it is very likely that the current window always contains 
at least /3n/ll 0-bits. This is proven by showing that in case the number of 0-bits in the window 
is in the interval [/3n/ll, (3n/10] then there is a tendency ("drift") to increase the number of 0-bits 
again. Applying a drift theorem by Oliveto and Witt [OWlOj yields that even in an exponential 
number of generations the probability that the number of 0-bits in the window decreases below 
/3n/ll is exponentially small. We first elaborate on why this drift holds and then explain how 
the lower bound of (3n/ 11 0-bits implies the claim. 

If a mutation decreases the number of 0-bits outside the window, the bits inside the window 
are subject to random, unbiased mutations. Hence, if the number of 0-bits is at most /3n/10 the 
expected number of bits flipping from 1 to is larger than the expected number of bits flipping 
from to 1. If the mutation probability is large enough, this makes up for the 0-bits lost outside 
the window and it leads to a net gain in 0-bits in expectation, with regard to the whole bit string. 
Note that the window is moved during such a mutation. As by Lemma [Jj the number of 0-bits 
outside the window is fixed to an, we have a net gain in 0-bits for the window, regardless of its 
new position. 

In case the number of 0-bits outside the window remains put, acceptance depends on a 
BinVal instance on the bits inside the window. For BinVal accepting the result of a mutation 
is completely determined by the flipping bit with the largest weight. In an accepted step, this bit 
must have flipped from to 1. All bits with smaller weights have no impact on acceptance and 
therefore are subject to random, unbiased mutations. If, among all bits with smaller weights, 
there is a sufficiently small rate of 0-bits, more bits will flip from 1 to than from to 1. In 
this case, we again obtain a net increase in the number of 0-bits in the window, in expectation. 
Here we also require a large mutation probability since every increase of BinVal implies that 
one 0-bit has been lost and a surplus of flipping 1-bits has to make up for this loss. This holds 
in particular since the window only contains [3n bits and the surplus' absolute value must still 
be large. 

For a fixed BinVal instance the bits tend to develop correlations between bit values and 
weights over time; bits with large weights are more likely to become 1 than bits with small 
weights. This development is disadvantageous since the above argument relies on many 1-bits 
with small weights. In order to break up these correlations we use random instances of BinVal 
wherever possible. These random instances change quickly. If B x = and, by Lemma [71 also if 
n + i x < L' we have at least an 0-bits outside the current window and every mutation that flips 
exactly one of these bits leads to a new BinVal instance. Since this happens with probability 
0(1) , this frequently prevents the algorithm from gathering 1-bits at bits with large weights. 
Pessimistically dealing with bits that have been touched by mutation while optimizing the same 
BinVal instance, a positive expected increase in the number of 0-bits can be shown. 
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How does the lower bound of fin /ll 0-bits inside the window imply Theorem [5]? With 
overwhelming probability we start with B x = and at least fin/10 0-bits in the window B\. 
We maintain at least fin /ll 0-bits in Bx, while the algorithm is encouraged to turn the 0-bits 
outside of Bx to 1 quickly. Once the number of 0-bits outside of Bx has decreased to or below 
an, the path has been reached. The 0-bits in Bx thereby ensure that the initial «*-value is at 
most fin. The reason is that every two sets Bi, Bj with \i — j\ > £ only intersect in at most 7/3n 
bits, so fin/11 0-bits in Bi imply at least fin/11 — jf3n 0-bits outside of Bj. For j to become 
the new window, however, at most an 0-bits outside of Bj are allowed. By choice of a, ft, and 
7, moving from Bx to Bj requires a linear number of 0-bits in Bx to flip to 1 if j > fin. The 
described mutation has probability n~ ni - n \ The last argument also shows that the probability of 
increasing i* by more than fin in one generation is n~ n ( n \ Hence, with overwhelming probability 
in each generation the (1+1) EA only makes progress at most fin on the path. As the path has 
exponential length, the claimed lower bound follows. 

5 Conclusions 

In this work, we analyzed how the (1+1) EA optimizes monotone functions. We showed that 
the optimum of any monotone function is found efficiently if the mutation probability is at most 
1/n. Surprisingly, once the mutation probability exceeds 33/n, the situation drastically changes. 
In this case, there are monotone functions that with high probability are not optimized within 
exponential time. 

This results indicates that, to a greater extent than expected, care has to be taken when 
choosing the mutation probability, even if restricting oneself to mutation probabilities c* /n with 
a constant c*. Contrary to previous observations, e.g., for linear functions, it may well happen 
that constant factor changes in the mutation probability lead to more than constant factor 
changes in the efficiency. 

Besides generally suggesting more research on the right mutation probability, this work leaves 
two particular problems open, (i) For the mutation probability 1/n, give a sharp upper bound 
for the optimization time of monotone functions (this order of magnitude is between il(nlogn) 
and 0(n 3 / 2 )). (ii) Determine the largest constant c* such that the expected optimization time 
of the (1+1) EA with mutation probability p{n) — c* /n is n°^ on every monotone function. 
Currently, we only know that 1 < c* < 33 holds. 
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A Appendix 



This appendix contains the full formal proofs of those statements for which only an explanation 
of the proof idea was given in the main part of the paper. 

A.l Proof of Theorem [2] 

Proof of Theorem^ Initially, there are at least [n/2\ 0-bits in x with probability at least 1/2. 
Considering the situation after (n/2) Inn mutations a simple and direct calculation reveals that 
with probability fi(l) at least one of these bits is never flipped }D JW02[ . This yields fl(n log n) 
as lower bound. 

For the upper bound we employ drift analysis [HYOlj . First, we consider a distance mea- 
sure d: {0, 1}™ -> with d{x) := \x\ Q . We have 

E(d(x) - d(x + ) | x) = Pr(x ^ x+) ■ E(d[x) - d(x + ) \ x A x ^ x+) 
+ Pr(x = x + ) ■ E(d(x) - d(x + ) \ x A x = x + ) 
= Pr(.T ^ x+) -E{d(x) - d(x+) \ x Ax ^ x+) 

x jo] . £. .E(d(x)-d(x+) \xAx^x+) 

\ 1 J n \ nJ 

■ E(d(x) - d(x+) \xAx^x+) 



> 

> 
> 



c \x 
n ■ e 
c \x 
n ■ e 



No, ( 1 _ (n _ 1) . £)><HNs 



and will use this lower bound on E(d(x) — d{x + ) \ x) later. Note that in the calculation above 
only in the inequality 

E(d(x) - d(x + ) I x A x ^ x+) > (l - (n - 1) • -) 

we make use of the fact the fitness function / is monotone. The event x 7^ x + implies that there 
is at least one bit that was mutated from a 0-bit into a 1-bit. This contributes 1 to the expected 
difference. The remaining n — 1 bits contribute only — c(n — l)/n since each flips with mutation 
probability c/n, only. 

Now, we consider a different distance measure d': {0, 1}™ with d'{x) := H d(x) where 

Hk = Y^,i=i 1/* denotes the kth Harmonic number. It is easy to see that Hj~ — Hi > (k — £)/k 
holds for any k,l £ N. We estimate 

E(d'(x) - d'(x+) I x) = E{H d{x) - H d[x+) \x)>E f^M-J^ll | ^ 

E(d(x) - d(x+) I x) _ E(d(x) - d{x+) \ x) c(l - c) 
d{x) \x\ ~ n ■ e c 

and obtain 

Ti • C C 

(ln(n) + 1) • - = 0(n log n) 
c(l - c) 

as upper bound on the expected optimization time by application of the drift theorem [HY01 
since the initial distance d'(x) is bounded by ln(n) + 1. □ 
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A. 2 Proof of Theorem M 



We consider the (1+1) EA with mutation probability c/n and say that the (1+1) EA is on 
level i* x if x is the current search point. We also speak of phase i* as the random time until the 
(1+1) EA increases its current level. Note that many phases can be empty. Bi* is called the 
current window of bits in situations where we are looking at a trajectory of these sets and want 
to emphasize that the bits we are considering might change over time. 

The main observation for our analysis is that the current window typically contains at least 
fin/11 0-bits. This property is maintained even during an exponential number of generations, 
with overwhelming probability. Under this condition, the probability of increasing the current 
level i* by a large value is very small. Intuitively speaking, the reason for this is that the sets Bi 
only have a small intersection and many bits have to change in order to move from Bi* to some 
set Bj with j ^> i* x . This is made precise in the following lemma. 

Lemma 10. Let 0<a<fi<jbe constants such that 1/11 — a/fi > 7 > (3/(1 — 2/3). Let 
fn, with respect to a, ft, and 7, be constructed as in Definition^ for arbitrary II. Let x be the 
current search point of the (1+1) EA with mutation probability c/n for a some c G K + optimizing 
/n- Assume that B x 7^ and Bi* contains at least fin/11 0-bits. Then the probability that the 
(1+1) EA increases the level i* x by more than fin in one generation is at most n~ n ( n \ 

Proof. Recall that \Bi* n Bj\ < jfin for all j > i* + fin. If Bi* contains more than an + jfin 
0-bits then there are more than an 0-bits outside of Bj. Hence j $ B x . 

A necessary condition for increasing i* to any value j > i* x + /3n is thus that one mutation 
decreases the number of 0-bits in Bi to a value below or equal to an + jfin. This is a decrease of 
at least fin /ll — an — ^fin =: nn bits for some constant < k < 1. The probability of flipping 
at least nn bits simultaneously is at most ( n ) ■ (c/n) Kn < c Kn ■ l/(nn)\ = n~ n ( n \ □ 

One conclusion from this lemma is that, with overwhelming probability, the (1+1) EA follows 
the path given by the sets Bi, as each phase increases the current level by at most f3n. This will 
establish the claimed time bound. The rest of this section deals with the proof of the invariance 
property on the number of 0-bits in the current window. For this proof we make use of the 
following drift theorem by Oliveto and Witt |OW10j . 

Theorem 11 (Simplified Drift Theorem |OW10j ). Let X t , t > 0, be the random variables 
describing a Markov process over a finite state space S C and denote A t (i) := {X t +i — X t \ 
X t = i) for i £ S and t > 0. Suppose there exist an interval [a, b] in the state space, two constants 
S, e > and, possibly depending on L :=b — a, a function r(L) satisfying 1 < r(L) = o(L/\og{L)) 
such that for all t > the following two conditions hold: 

1. E(A t (i)) > e for a < i < b, 

2. Pr(A t (<) < -j) < fori>a and jeN . 

Then there is a constant k > such that for T* := mm{t > : X t < a \ Xo > b} it holds 
Pr(T* < 2 KL/r( - L ^) = 2~ 0(L/r(i)) . 

A prerequisite for this theorem is that the number of 0-bits in the current window increases 
in expectation, when the number of 0-bits is in a certain interval. We choose the interval 
[fin/11, /3n/10.5], but establish lower bounds for the drift with respect to a larger interval 
[fin/11, fin/10]. The larger interval will be used later on when proving that the after initial- 
ization the (1+1) EA finds the start of the path. 

The "drift" on the number of 0-bits will be bounded from below by positive constants in two 
cases: either the current level remains fixed in one generation or the current level is increased. 
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We start with the former case and give a lower bound for the number of 0-bits in the current 
window. 

A. 3 Invar iance for Non- Sliding Windows 

Before we formulate the main statement of this subsection, let us introduce some shorthands. 
For any x let xb '■= x\ B denote the substring of x induced by Bi*, i.e., the substring in the 

current window. Recall that \xb\ denotes the number of 0-bits in the current window. That 
is, \xb\ = \{j € {h*, ■ ■ - bi*+e-i} \ Xj — 0}|. For readability purposes we write \x~g\ instead of 
|(o; + )b| for the number of 0-bits of x + in its window. Note that, in this subsection, we always 
deal with the case that i* — i* x+ anyway. 

In the following we show that, whenever the number of 0-bits in the current window is in 
the interval [jj, ^jr], we observe a drift towards more 0-bits. This is formalized in the following 
lemma. 

Lemma 12. Let < a < (3 < 1 and c > be constants. Let n be sufficiently large and let 
f, with respect to a and (3, be constructed as in Definition^ Let x be the current search point 
of the (1+1) EA maximizing f. Assume \xb\ € [jj, ^]. We denote by A the event that the 
(1+1) EA maximizing f and starting in x does not leave the current level, i. e., i* = Then, 
the following two statements hold. 

1. For every constant e > the number of different bits that are flipped during phase i* is at 
most 2ef3cn, with probability 1 — exp(— O(n)). 

2. For small enough e > 0, assuming that the event from 1. holds, there exists a constant 
5 > such that the drift in the number of 0-bits E(|a;^| — \xb\ \ A) is at least S. 

The proof of this lemma will heavily depend on the drift in the number of 0-bits induced by 
the random BinVal within the current window. In the proof of Lemma [T2l we will have to deal 
with variable lengths of the considered bit string. Therefore, the following auxiliary lemma is 
formulated for a bandwidth of possible bit string lengths. 

Lemma 13. Let < /3 < 1, < e < /3, and c > 2(|/3 — e) 1 be constants. Consider the 
(1+1) EA with mutation probability c/n maximizing a random BinVal on u > fin — en bits as 
defined in Definition^ Let x denote the current search point. Assume \x\ Q S [itjf^]- If the 
random assignment of the function weights is independent of the position of the 0-bits Z , there 
exists a constant S > such that the drift in the number of 0-bits within the current window is 
at least S, i. e., E(|i + | — |5| ) > 6. 

In order to prevent confusion, let us remark that the expectation is drawn both with respect 
to the random assignment of the function weights as well as with respect to the position of the 
0-bits of x. 

Proof of Lemma\T^ Let f3, e, c, u, and x be as above. As a first observation, let us recall the 
following. Whenever x + = x, it holds that |i + | — \x\ a = 0. Thus, we are only interested in the 
case x + 7^ x. Note that in this case, the construction of BinVal implies that the bit with the 
largest weight is one that flips from to 1 as the (1+1) EA would otherwise not accept mut(i) 
as a new search point. For all other bits that are being flipped in this iteration, the direction of 
the flip bit (i. e., whether the bit itself is a 0-bit flipping to 1 or a 1-bit flipping to 0) is random 
and does only depend on the shares of 0- and 1-bits. This will be formalized in the following. 
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For readability purposes, let us introduce the following notations. For every fc G {0, . . . , u} 
we denote by pk the probability that the (1+1) EA flips exactly fc bits. Clearly, pk = (T) (f) fe (l — 
£ ) u ~ k for fc > 1 and p Q = (1 - f )«. 

Let us, for the moment, assume that exactly k bits are being flipped and let us consider 
the substring of the nipping bits only. If we remove from the substring the bit with the largest 
weight (which flips from to 1), we get that the expected number of 0-bits in this reduced 
substring equals (fc — 1) ^T 1 . Analogously, the expected number of 1-bits in the bit string equals 
(fc — 1) — (k — 1) ■ We thus obtain for this specific setting that the expected difference of 

\x+\ \x\ a equals ((fc - 1) - (fc - - (fc - - 1 = fc(l - 2%^) - (2 - 2^). 

Now, for any such fc, it holds that E(|i + | — \x\ Q \ k bits flip) equals the probability that the 
flipping bit with the largest weight flips from to 1 (which occurs with probability ^j 2 -) times the 

drift conditional on fc bit flips and mut(i) ^ x. The latter equals fc(l — 2 ^j, 1 ) — (2 — 2 ) 
as outlined above. 

Combining these observations, we gain the following. 

E(|- + | - l*lo) = [*(1 - 2^) - (2 - 2^1)] . 

fe=i 

Clearly, Y^k=oPk = 1 as we are dealing with a distribution. Thus, Y^k=\Pk = 1 — (1 — f )"• On 
the other hand, J2k=i Pkk — f u > c(/3 — e) as this sum equals the expected number of bit flips. 
This yields 

E(|5+| - |S| ) > J^[(l - 2^)c(P -e)-(2- 2^(1 - (1 - *)«)] . 

Plugging in % > £ and < ®* < & < yields 

E(|5 + 1 - |i| ) > £ [(1 - ^)c(/3 - e) - 2] > £ [c(f/? - e) - 2] , 

which, for c > 2(|/3-e)" , can clearly be bounded from below by some positive constant 6. □ 

We can now easily deduct Lemma IT21 

Proof of LemmaMSX Let A, (3, a, /, and x be as in the statement of the lemma. Furthermore, 
let us assume that event A holds. That is, the acceptance of the mutated bit string mut(x) is 
fully determined by the random BinVal within the current window. Thus, we can restrict our 
attention to the current window. 

Let us begin with proving the first claim. For this purpose, let e > be a constant. We prove 
an auxiliary claim stating that with probability exp(— f2(n)) the time Tj* until the (1+1) EA 
exits level i* is at most en. That is, we can assume that phase i x * does not take longer than en 
steps. We then show how to derive the original claim. 

By construction, the (1+1) EA exits level i* x if exactly one of the an 0-bits outside the current 
window is being flipped. Thus, the probability Pr(i* 7^ i* + ) to exit the current level in one step 
is at least an ■ r • (1 — n) n_1 > ace~ c (l — f) c_1 • It follows that the probability of not exiting level 
i* in en steps is at most (1 — ace~ c (l — f ) c_1 ) e " < exp(— ace~ c (l — ^) c ~ 1 en) = exp(— fl(n)). 

Now, the expected number of bits that have been flipped in en steps is at most en ■ fin ■ ^ = 
eflcn. We apply Chernoff bounds, and obtain that the probability that more than 2ef3cn bits are 
being flipped in en steps is at most exp(— ^eficn) — exp(— O(n)). 
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We continue with the second claim. Therefore, let us assume that no more than 2s(3cn bits 
are being nipped during phase i*. We note already here that we can conclude the following. The 
probability of flipping in the current iteration a bit that has already been flipped in a former 
iteration of phase i* is at most 2e/3cn ■ ^ = 2ef3c 2 . That is, Pr(G | A) < 2ef3c 2 . 

We denote by G the event that in the current iteration, the (1+1) EA flips a bit that has 
already been flipped in a former iteration of phase i* and rewrite 

E(|4| -Mo \A)=Pt(G\A)E(\x+\ q -\x b \ q \AAG) 

+ (l-Pr(G| A))E(\x+\ -\x B \ I A AG), 

with G denoting the complementary event of G. Now, whenever G occurs, we adopt a worst 
case view by assuming that all bits flip the wrong direction, i.e., from to 1. For this purpose, 
let us, for the moment, assume that G holds. In this case, at least one bit flips and we can, very 
pessimistically, assume that each of the flipping bits reduces the number of 0-bits by 1. Note 
that, given that one bit flips, the expected number of total bit flips in the current window equals 
1 + ~(/8n - 1) < 1 + c/3. Thus, we can bound E(|a;^| - |x_b| | A A G) from below by -1 - c/3. 
That is, under our assumption, it holds that 

Pr(G | A)E(|4| Q - \x B \ \ A AG) > 2e/3c 2 (-l - c/3) . 

We now need to give bounds for the second summand. For this purpose, we apply Lemma 
1131 As we are conditioning on G, we apply the auxiliary lemma with u denoting the number of 
bits that have not been flipped in any former iteration of phase i*. Furthermore, we are only 
interested in the substring x of x consisting of these u yet unflipped bits. As we have seen in 
the first part of this proof, with probability 1 — exp(— Q(n)) it holds that u > fin — 2ef3cn. As (3 
and c are constants, we can choose e small enough, such that 2e(3c < (3. Furthermore, as c > 

we can choose e small enough such that c > 2(|/3 — 2e/3c) 1 . Application of Lemma [131 yields 
E(|x^| — \xb\ I A A G) > 5 for some positive constant S. 
Altogether we obtain that, 

E(|4| -M I A) > 2ef3c 2 {-l-c/3) + (l-2e/3c 2 )S. 

Lastly, we observe that we can choose e small enough such that this term can be bounded from 
below by some positive constant 5, as claimed. □ 



A. 4 Invariance for Sliding Windows 

We now consider the case where the current level is increased, i. e. a transition from z* to z* + 
with i* < i* x+ < V happens. Recall that xb '■= x\ B denotes the substring of x induced by Bi*, 

i.e., the substring in the current window. Moreover, let := xi g denote the substring of 

x + induced by Bi* + . Note that, in this subsection, we deal with the case i* ^ i* x+ and thus, 
Bi* ^ Bi* + and 7^ xb holds. As before, let \xb\ denote the number of 0-bits of x and |zEg|o 
the number of 0-bits of x + in the corresponding current window. 

We show that also in this situation we have a drift in the number of 0-bits within the current 
window that is bounded below by a positive constant. Due to the transition it is no longer 
sufficient to only consider changes within the current window. Furthermore, transitions are 
often triggered by changes outside the current window. Thus, we switch to a form of a "global" 
view and take into account both the changes within the current window as well as changes outside 
the current window. We formalize this within the next lemma. 
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Lemma 14. Let < a < ft < 1 with 5/(2/3) < c < l/(3a) &e constants. Let n be sufficiently 
large and let f, with respect to a and ft, be constructed as in Theorem^ Let x be the current 
search point of the (1+1) EA maximizing f. We denote by A the event that a transition from 
level i* to i* x+ with i* < i* x+ < L' occurs in an iteration of the (1+1) EA maximizing f . Assume 
\x B \ Q < (l/10))8n. 

Then, there is a constant 5 > such that the drift in the number of O-bits is at least 6, i. e., 
V(\x B \ -\x B \ \A)>6. 

Proof. Let Bi* = [n]\I?j*, the indices not contained in the current window, and x-g :— x\— the 
corresponding induced substring of x. Analogously, we define a;i := x\— . Due to Lemma [7] 

x + 

an. 



we have \xig\ Q 



x + 

X B 





The main part of the proof is to derive a lower bound on E(|x^| | A). Afterwards we show 
that this bound together with the given prerequisites on a, ft, c and \xb\ yields a positive drift 
in the number of O-bits. 

It is easy to see that, conditional on A, the expected number of O-bits in the new window 
Bi* + after a transition from i* to i* x+ can be derived as the difference of the expected number of 
O-bits in the current window Bi* after mutation and the expected amount of O-bits lost outside 
the current window due to mutation: 

E (|4| Q | A) = E (|mut(x B )| \A)-~E (\x^\ - |mut(^)| | A) (1) 

We derive bounds for both parts of (fTJ) separately. We start with an upper bound on the expected 
number of O-bits in the current window after mutation, i.e., E (|mut(xs)| | A) by the following 
case distinction. 

In the first case, the transition happens independently of the change in the window. This case 
happens with probability f2(l) as a 1-bit mutation of one of the an O-bits outside the current 
window suffices. In this situation, the expected number of O-bits in the window is independent 
of A and thus, can be easily calculated as follows. 

(c \ c c c 
1 — ) ■ Mo + - ■ Mx = Mo Mlo + - • Mli 
n/ n n n 

= Mlo + C --(ftn-2 M ) = M + eft - 

For the second case, i. e., if the mutation within the current window has influence on the transition 
performed, we have to be more careful as the expected number of O-bits within the window is no 
longer independent of A. However, before the mutation the leftmost bit in the current window is 
zero since otherwise we would have been able to increase the fitness by performing a transition 
by exactly one position. If this leftmost zero is nipped a transition is performed. Furthermore, 
it is necessary to flip this leftmost zero if the mutation within the window is supposed to have 
influence on the transition performed. The probability to flip this bit is c/n and thus, the 
probability for this case is at most c/n. 

We bound the contribution of this case in a pessimistic way. Similar to Lemma [TU1 we see that 
the number of bits flipping in one single iteration is at most 0(log n) with probability 1 — n~ . 
Furthermore, the contribution is at most ftn otherwise. Altogether, this yields a contribution to 
the expected value of at most 



^ • ((l - ■ logn + -ftnj=o( 



logn 



1G 



and we get the following lower bound on the expected number of 0-bits within the current window 
after mutation. 



E (|mut(x B )| Q | A) > \x B \ Q +c/3- ^]^k _ 



logn 



(2) 



The second part of {JTJ) , i . e. the expected loss of 0-bits outside the current window due to mutation, 
is more difficult. For the sake of readability, let k := \x-g\ Q — |mut(a^)| , the loss of 0-bits outside 
the current window due to mutation. Then, we are searching for E(fc | A). We distinguish two 
cases. If k > we definitely observe a transition and accept the new search point. If k < a 
transition does not necessarily occur. For k < the new search point is only accepted if this 
is the case. For k = the search might also be accepted depending on the changes within the 
current window. We see that E(fc) < E(fc | A) < E(k | k > 0) holds. 

Let Zq be the number of 0-bits flipping to one and Z\ the number of 1-bits flipping to zero. 
This yields E(fc | k > 0) = E(Z - Z x \ Z > Z x ). With \ X g\ Q = an and = (1 - a - /3)n we 

get the following transformation of the expected value sought. 

E{Z -Z 1 | Z >Z 1 ) 
(l-a-^)n 

E Pr ( Z i = *) ' E ( Z o -Z 1 \Z > Z t and Z x = i) 

i=Q 
(l-a-/3)n 

E Pr ( Z i = *) ' ( E (^o \Z >i)~ E(Zi | Z > Z x and Z x = i)) 

i=0 
{l-a-0)n 

E Pr(^i = ' (E(Z \Z >i)-i) 

i=Q 

(1— a— /3)n I an 



(*) 



(l-a-P)n 

E p^i = *) 

i=0 
(l-a-/3)n 

E P*(Zi=i) 

i=0 



V J=0 
an 

Ei 



Pr(Zo = j and Zo > i) 
Pr(Z > i) 



, an v 

/ E i-Pr(Z =j) \ 



Pr(Z > i) 



V 



/ 



(l-a-P)n 

E Pr (^ 

i=0 



(l-a-P)n 

E Pr ( z i = o 

i=Q 



f(i + l) Pv(Z > i) + E Pr(Z„ > j) \ 

j=i+l 

Pr(Z > i) 

\ 

, an v 

/ E Pr(Z 0>J )\ 



V 



Pr(Z > i) 



(3) 



/ 
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Note that in (*) we used the following transformation. 



£ j ■ PHZo = j) 

j=i+l 

= (i + 1) • Pv(Z = i + 1) + (i + 1) • Pr(Z = i + 2) + • • • + (i + 1) • Pr(Z = an) 

+ Pr(Z = % + 2) + ■ ■ ■ + Pr(Z = an) 

+ Pr(Z = » + 3) + • • • + Pr(Z = an) 

+ Pr(Zo = an) 

= (i + 1) • Pr(Z > i) + Pr(Z > i + 1) + Pr(Z > t + 2) + • • • + Pr(Z > an - 1) 

an 

= (i + 1) • Pr(Z > i) + J2 P < Z o > i) 

j=i+l 

It is easy to see the following estimations for the probabilities used above. 

^ n ( an \ ( c\ i +1 

P ^ 0> ^ * + lj'(n) 

/ an \ /C\ l+1 ( c \an-i-X 

^ Ulj-(n) \ X -~n) 
pr(Zi = i) = ^".-^.(^.(l-^ 1 * 



Plugging these inequalities into ([3]) yields the following expression for the expected loss of 0-bits 
outside the current window due to mutation. 



(l-a-/3)n 



E(k\A) < 



i=0 



(1 -a- f3)n 



Vn/ V n 



(l-a- 



i+ 



j='+i 



v 



(l-a-/3)n 



(1 — a — /?)n\ / c 



\nJ \ n 



n / 



E G?i)'(S) 



3+1 



j=i+l 



(m)-(fr-(i-fr^ 1 



(4) 



We start with the second part of this term and derive the following lower bound. 
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Remember, that we assume c < l/(3a) and thus, ac < 1 holds. Therefore, we can further 
simplify the above inequality by using the following simple estimation. 
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Plugging all this into (jlj) yields a lower bound on the expected value sought. 
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We are now able to put the results from ([2]) and ([5]) together to get a lower bound on (fTJ. 
E(|4|JI) = E(\mnt(x B )\ \A)-E(\xs\ -\mut(Xs)\ \A) 

With \xb\q < (l/10)/3n, 5/(2/3) < c < l/(3a) this yields the following lower bound on the drift 



19 



in the number of 0-bits. 

E (|4ln-MoM) 



Since \fe/2 < 1, lim \/n/(n — c) = 1 and lim log n/n = this is bounded below by a positive 

n— >-oo n— >oo 

constant (5 for sufficiently large values of n which concludes the proof. □ 
Finally, we are able to prove the claimed invariance property. 

Lemma 15. Let 0<a<(3<lbe constants such that 7 > /3/(l — 2/3), c > 5/(2/3), and 
a > l/(3c). Let /n, with respect to a and f3, be constructed as in Definition® for II chosen 
uniformly at random. 

Assume that for the current search point x of the (1+1) EA it holds B x 7^ and the current 
window contains at least /3n/10.5 0-bits. There is a constant k > such that with probability 
1 — 2~ n ( n > in the following 2 Kn generations the (1+1) EA always has at least fin/11 0-bits in the 
current window or the end of the path is reached. 

Proof. First observe that the event described in the first statement of Lemma [T2] occurs with 
probability 1 - 2~ u ( n \ By the union bound, the probability that the event occurs within 2 K ™ 
phases is still 1 — 2~ n ( n > if k > is a sufficiently small constant. 

We apply the drift theorem (Theorem fTTT) to a potential that reflects the number of 0-bits in 
the current window. Consider the interval [/3n/ll, /3n/10.5] and observe that by assumption the 
algorithm starts with a potential of at least /3n/10.5. Using Lemma [T2l with the condition from 
the first paragraph and Lemma [HJ if the current potential is within the interval and the end of 
the path is not reached then the expected increase in the potential is bounded from below by a 
positive constant. 

For j € No the probability that the potential decreases by j is bounded from above by the 
probability that the (1+1) EA flips at least j bits. This probability is at most (")(c/n)- J < 

c-'/OO < (ec/jY < 2~- 7 • 2 2ec , where the last estimation is trivial for j < 2ec and obvious 
otherwise. Applying Theorem [TT] with S := 1 and r := 2 2ec yields that with overwhelming 
probability in 2 K ™ generations, if again k is sufficiently small, the potential does not decrease 
below /3n/ll or the end of the path is reached. □ 

All that is left to complete the proof of the main result is the fact that the path is reached 
from a random initialization, with overwhelming probability. 

Lemma 16. Let 0<a</3<7<l be constants such that 1/11 — a/(3 > 7 > 0/(1 — 2/3) and 
c > 5/(2/3). Let /n, with respect to a,fi, and 7, be constructed as in Definition® for II chosen 
uniformly at random. With probability 1 — 2~ r2 (") the (1+1) EA with mutation probability c/n 
for c > 33 optimizing /n at some point of time reaches some search point x with i* < /3n and 
\x [Bi , |o > /3n/10.5. " 
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Proof. Let x be the first search point where B x ^ 0. If I^ibJo > fin/11 then for every j > fin, 
since fin/11 > an + -ffin and \Bi n Bj\ < jfin, we have j ^ S^. Hence, we only need to prove 
that the number of 0-bits in B\ does not decrease below fin /ll until B x becomes non-empty for 
the first time. 

The set B x is non-empty if the number of 0-bits outside of B\ has decreased towards a value 
at most an. As every mutation decreasing the number of 0-bits outside of B\ is accepted and 
such a mutation has a probability of at least an ■ c/n{l — c/n) n ~ x = 0(1), for any initialization 
the expected number of generations until we have decreased the number of 0-bits to a value at 
most an is 0(n). In addition, by Chernoff bounds the probability that more than n 2 generations 
are necessary is 2~ nl - n \ 

The probability that the initial search point contains at least fin/10 0-bits in Bx is 1— 2 _n ( n ) 
by Chernoff bounds. Assume that this happens and consider a situation where we have at least 
an 0-bits outside of B\ . Arguing as in the proof of Lemma 1121 if the number of 0-bit is within 
fin/11 and fin/10 then there is a positive drift. Instead of considering a new random BinVal 
instance when the current i*-value is increased, we obtain a new BinVal instance whenever 
the number of 1-bits outside the window is increased. (The probability for the latter event can 
even be larger than the probability for the former.) This allows us to apply Lemma [T51 in the 
same fashion as in the proof of Lemma [T2l This results in a positive drift and applying the drift 
theorem as in Theorem IT5lw. r. t. the interval [/3n/10.4, fin/10] proves that in n 2 generations the 
number of 0-bits in B\ does not drop to or below fin/lOA, with probability 1 — 2~ n ^ n \ 

Consider the mutation that creates x. Since x is the first search point where B x ^ 0, its parent 
must have had more than an 0-bits outside of B\. The probability that during mutation more 
than /3n/1100 bits were flipped is Hence |i| > an- /3n/1100 + /3n/10.4 > an + fin/10.5. 

By Lemma [7] we then have \x\Bi, |o = Mo — an > fin/ 10. 5. As the sum of all error probabilities 
is 2~ S1 ("), the claim follows. □ 

Now we are prepared to prove Theorem [9] 

Proof of Theorem^ It is easily verified that for the chosen values 0<a</3<7<l,c> 
5/(2/3), 1/11 — a/fi > 7 > fi/(l — 2/3), and a < l/(3c) holds, satisfying all preconditions on these 
variables for Lemmas [TU1 [T51 and [TBI By Lemma [T51 the (1+1) EA reaches some search point 
x with i* x < fin and |o > fin/10.5 with overwhelming probability. Lemma 1151 then states 

that with probability 1 — 2~ n (") the number of 0-bits in the current window is always at least 
fin/ 11 until the end of the path is reached or 2 K ™ generations have passed for a sufficiently small 
constant k > (which would correspond to the claimed time bound). 

Given the condition on the 0-bits, by Lemma [TU] the (1+1) EA increases its current i*-value 
by at most fin in one generation, with probability 1 — n^^ n \ The probability that this always 
happens until an i* -value of L' is reached is at least 1 — L' ■ n~°(") = 1 — n~ n ^ since L' = 2 e ( n \ 
This implies that (1+1) EA spends at least L' /(fin) — 1 > 2 Kn generations on the path, with 
probability 1 - 2" n ( n ), if k is chosen small enough. Since the sum of all error probabilities is 
2" n («), the claim follows. □ 
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